Semi-supervised Clustering with Pairwise Constraints: A Discriminative Approach
نویسنده
چکیده
We consider the semi-supervised clustering problem where we know (with varying degree of certainty) that some sample pairs are (or are not) in the same class. Unlike previous efforts in adapting clustering algorithms to incorporate those pairwise relations, our work is based on a discriminative model. We generalize the standard Gaussian process classifier (GPC) to express our classification preference. To use the samples not involved in pairwise relations, we employ the graph kernels (covariance matrix) based on the entire data set. Experiments on a variety of data sets show that our algorithm significantly outperforms several state-of-the-art methods.
منابع مشابه
Discriminative Similarity for Clustering and Semi-Supervised Learning
Similarity-based clustering and semi-supervised learning methods separate the data into clusters or classes according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose a novel discriminative similarity learning framework which learns discriminative similarity for either data clustering or semi-supervised learning...
متن کاملSemi-supervised Clustering of Graph Objects: A Subgraph Mining Approach
Semi-supervised clustering has recently received a lot of attention in the literature, which aims to improve the clustering performance with limited supervision. Most existing semi-supervised clustering studies assume that the data is represented in a vector space, e.g., text and relational data. When the data objects have complex structures, e.g., proteins and chemical compounds, those semi-su...
متن کاملActive Learning of constraints using incremental approach in semi-supervised clustering
Semi-supervised clustering aims to improve clustering performance by considering user-provided side information in the form of pairwise constraints. We study the active learning problem of selecting must-link and cannot-link pairwise constraints for semi-supervised clustering. We consider active learning in an iterative framework; each iteration queries are selected based on the current cluster...
متن کاملClustering from Multiple Uncertain Experts
Utilizing expert input often improves clustering performance. However in a knowledge discovery problem, ground truth is unknown even to an expert. Thus, instead of one expert, we solicit the opinion from multiple experts. The key question motivating this work is: which experts should be assigned higher weights when there is disagreement on whether to put a pair of samples in the same group? To ...
متن کاملFuzzy Clustering with Pairwise Constraints for Knowledge-Driven Image Categorization
The identification of categories in image databases usually relies on clustering algorithms that only exploit the feature-based similarities between images. The addition of semantic information should help improving the results of the categorization process. Pairwise constraints between some images are easy to provide, even when the user has a very incomplete prior knowledge of the image catego...
متن کامل